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Abstract 



The problem central to sparse recovery and compressive sensing is that of stable sparse recovery: we 
want a distribution A of matrices A E R™^" such that, for any x G M" and with probability 1 — 5 > 2/3 
over A E A, there is an algorithm to recover x from Ax with 




for some constant C > 1 and norm p. 

The measurement complexity of this problem is well understood for constant C > 1. However, in 
a variety of applications it is important to obtain C = 1 + e for a small e > 0, and this complexity 
is not well understood. We resolve the dependence on e in the number of measurements required of a 
fc-sparse recovery algorithm, up to polylogarithmic factors for the central cases of p — 1 and p — 2. 
Namely, we give new algorithms and lower bounds that show the number of measurements required is 
fc/eP/^polylog(7i). For p = 2, our bound of i/clog(n/fc) is tight up to constant factors. We also give 
matching bounds when the output is required to be fc-sparse, in which case we achieve fc/e^'polylog(n). 
This shows the distinction between the complexity of sparse and non-sparse outputs is fundamental. 

1 Introduction 

Over the last several years, substantial interest has been generated in the problem of solving underdeter- 
mined hnear systems subject to a sparsity constraint. The field, known as compressed sensing or sparse 
recovery, has applications to a wide variety of fields that includes data stream algorithms IIMut05l . medical 
or geological imaging IICRT061 lDon06ll . and genetics testing USAZlOi The approach uses the power of a 
sparsity constraint: a vector x' is k-sparse if at most k coefficients are non-zero. A standard formulation for 
the problem is that of stable sparse recovery: we want a distribution A of matrices A £ ^'"X" such that, for 
any x G M" and with probability 1 — 5 > 2/3 over A £ A, there is an algorithm to recover x from Ax with 



for some constant C > 1 and norm jo. We call this a C -approximate ip/lp recovery scheme with failure 
probability 6. We refer to the elements of Ax as measurements. 

It is known [.CRT06, GLPSIOJ that such recovery schemes exist for p G {1, 2} with C = 0(1) and 
m = 0{k\ogj,). Furthermore, it is known MDIPWIOI IFPRUlOl that any such recovery scheme requires 
Q{k\ogij^(j ^) measurements. This means the measurement complexity is well understood for C = 1 + 
Q(l), but not for C = 1 + o(l). 

'Some formulations allow the two norms to be different, in which case C is not constant. We only consider equal norms in this 
paper. 
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Figure 1: Our results, along with existing upper bounds. Fairly minor restrictions on the relative magnitude 
of parameters apply; see the theorem statements for details. 

A number of applications would like to have C = 1 + e for small e. For example, a radio wave signal 
can be modeled as x = x* +w where x* is A;-sparse (corresponding to a signal over a narrow band) and the 
noise w is i.i.d. Gaussian with \\w\\p ^ D\\x* \\p IITDB09I . Then sparse recovery with C = 1 + a/D allows 
the recovery of a (1 — a) fraction of the true signal x*. Since x* is concentrated in a small band while w is 
located over a large region, it is often the case that a/D <^ 1. 

The difficulty of (l + e)-approximate recovery has seemed to depend on whether the output x' is required 
to be fc-sparse or can have more than k elements in its support. Having /c-sparse output is important for some 
applications (e.g. the aforementioned radio waves) but not for others (e.g. imaging). Algorithms that output 
a /c-sparse x' have used e(ifc log n) measurements IICCF02I ICMOil ICM06i iWai09 J . In contrast, l,GLPS10ll 
uses only Q{^k log(n/fc)) measurements for j5 = 2 and outputs a non- /c-sparse x'. 

Our results We show that the apparent distinction between complexity of sparse and non-sparse outputs is 
fundamental, for both p = I and p = 2. We show that for sparse output, Q,{k/e^) measurements are neces- 
sary, matching the upper bounds up to a log n factor. For general output and p = 2, we show Q{^k log (n /k)) 
measurements are necessary, matching the upper bound up to a constant factor. In the remaining case of 
general output and p = 1, we show U,{k/ y/e) measurements are necessary. We then give a novel algorithm 
that uses 0( '°^Ji^'^^ fclogn) measurements, beating the 1/e dependence given by all previous algorithms. 
As a result, all our bounds are tight up to factors logarithmic in n. The full results are shown in Figure [T] 

In addition, for p = 2 and general output, we show that thresholding the top 2k elements of a Count- 
Sketch IICCF02II estimate gives (1 + e) -approximate recovery with Q{-k\ogn) measurements. This is in- 
teresting because it highlights the distinction between sparse output and non-sparse output: IICM06II showed 
that thresholding the top k elements of a Count-Sketch estimate requires m = Q{-^k log n). While MGLPSIOI 
achieves m = Q{^k\og{n / k)) for the same regime, it only succeeds with constant probability while ours 
succeeds with probability 1 — n~^(^); hence ours is the most efficient known algorithm when 5 = o(l), e = 
o(l), and k < nP-^. 

Related work Much of the work on sparse recovery has relied on the Restricted Isometry Property IICRT06II . 
None of this work has been able to get better than 2-approximate recovery, so there are relatively few papers 
achieving (1 + e)-approximate recovery. The existing ones with O(fclogn) measurements are surveyed 
above (except for IIIR08I . which has worse dependence on e than [CM04| for the same regime). 

A couple of previous works have studied the i^o / ip problem, where every coordinate must be estimated 
with small error. This problem is harder than £p/ip sparse recovery with sparse output. For p = 2, [Wai09 1 
showed that schemes using Gaussian matrices A require m = n{\klog{n/k)). Forp = 1, IICM05II showed 
that any sketch requires Q{k/e) bits (rather than measurements). 
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Independently of this work and of each other, multiple authors IICDlll IITIOI lASZlOl have matched 
our Q,{^klog{n/k)) bound for £2/^-2 in related settings. The details vary, but all proofs are broadly similar 
in structure to ours: they consider observing a large set of "well-separated" vectors under Gaussian noise. 
Fano's inequality gives a lower bound on the mutual information between the observation and the signal; 
then, an upper bound on the mutual information is given by either the Shannon-Hartley theorem or a KL- 
divergence argument. This technique does not seem useful for the other problems we consider in this paper, 
such as lower bounds for £i/li or the sparse output setting. 

Our techniques For the upper bounds for non-sparse output, we observe that the hard case for sparse 
output is when the noise is fairly concentrated, in which the estimation of the top k elements can have ^/e 
error. Our goal is to recover enough mass from outside the top k elements to cancel this error. The upper 
bound for p = 2 is a fairly straightforward analysis of the top 2k elements of a Count-Sketch data structure. 

The upper bound for p = 1 proceeds by subsampling the vector at rate 2^* and performing a Count- 
Sketch with size proportional to for i G {0, 1, . . . , 0(log(l/e))}. The intuition is that if the noise is well 

spread over many (more than k/e^^^) coordinates, then the £2 bound from the first Count-Sketch gives a very 
good £1 bound, so the approximation is (1 + e) -approximate. However, if the noise is concentrated over a 
small number k/e^ of coordinates, then the error from the first Count-Sketch is proportional to 1 + e^/^+i/^ 
But in this case, one of the subsamples will only have 0{k/e'^^'^~^^'^) < k/^oi the coordinates with large 
noise. We can then recover those coordinates with the Count-Sketch for that subsample. Those coordinates 
contain an e^/2+i/4 fraction of the total noise, so recovering them decreases the approximation error by 
exactly the error induced from the first Count-Sketch. 

The lower bounds use substantially different techniques for sparse output and for non-sparse output. For 
sparse output, we use reductions from communication complexity to show a lower bound in terms of bits. 
Then, as in f PIPWlOl . we embed ©(log n) copies of this communication problem into a single vector. This 
multiplies the bit complexity by log n; we also show we can round Ax to log n bits per measurement without 
affecting recovery, giving a lower bound in terms of measurements. 

We illustrate the lower bound on bit complexity for sparse output using k = 1. Consider a vector x 
containing l/e^ ones and zeros elsewhere, such that X2i + X2i+i = 1 for all i. For any i, set Z2i = Z2i+i = 1 
and Zj = elsewhere. Then successful (1 + e/3)-approximate sparse recovery from A{x + z) returns z with 
supp(z) = supp(2;) n {2i, 2i + 1}. Hence we can recover each bit of x with probability 1 — 6, requiring 
0(l/eP) bitH We can generalize this to /c-sparse output for Q(k/e^) bits, and to 5 failure probability with 
0(^ log ^). However, the two generalizations do not seem to combine. 

For non-sparse output, we split between £2 and £1. In £2, we consider A{x + w) where x is sparse and w 
has uniform Gaussian noise with \\w\\2 ~ HxUg/e. Then each coordinate of y = A{x + w) = Ax + Aw is a 
Gaussian channel with signal to noise ratio e. This channel has channel capacity e, showing I{y; x) < em. 
Correct sparse recovery must either get most of x or an e fraction of w; the latter requires m = U{en) and 
the former requires I{y; x) = Q,{k log(n//c)). This gives a tight Q{-k log{n/k)) result. Unfortunately, this 
does not easily extend to £1 , because it relies on the Gaussian distribution being both stable and maximum 
entropy under £2 ; the corresponding distributions in £1 are not the same. 

Therefore for £1 non-sparse output, we have yet another argument. The hard instances for = 1 must 
have one large value (or else is a valid output) but small other values (or else the 2-sparse approximation 
is significantly better than the 1-sparse approximation). Suppose x has one value of size e and d values of 
size l/d spread through a vector of size d^. Then a (1 + e/2)-approximate recovery scheme must either 
locate the large element or guess the locations of the d values with Q{ed) more correct than incorrect. The 
former requires l/{de'^) bits by the difficulty of a novel version of the Gap-£oo problem. The latter requires 
ed bits because it allows recovering an error correcting code. Setting d = e^'^l'^ balances the terms at (T^l"^ 

^Forp = 1, we can actually set |supp(2:)| = 1/e and search among a set of 1/e candidates. This gives n(i log(l/e)) bits. 
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bits. Because some of these reductions are very intricate, this extended abstract does not manage to embed 
log n copies of the problem into a single vector. As a result, we lose a log n factor in a universe of size 
n = poly(A;/e) when converting to measurement complexity from bit complexity. 



2 Preliminaries 

Notation We use [n] to denote the set {1 ... n}. For any set S C [n], we use S to denote the complement 
of S, i.e., the set [n] \ S. For any x G M", denotes the ith coordinate of x, and xs denotes the vector 
x' G M" given by = Xj if i G S, and = otherwise. We use supp(x) to denote the support of x. 



3 Upper bounds 

The algorithms in this section are indifferent to permutation of the coordinates. Therefore, for simplicity of 
notation in the analysis, we assume the coefficients of x are sorted such that |xi| > |x2| > • • • > |x„| > 0. 



Count-Sketch Both our upper bounds use the Count-Sketch IICCF02II data structure. The structure con- 
sists of c log n hash tables of size 0{q), for 0{cq log n) total space; it can be represented as Ax for a matrix 
A with 0{cq log n) rows. Given Ax, one can construct x* with 



l|2 ^ 1 



(3) 



with failure probabiUty 



3.1 Non-sparse ^2 

It was shown in IICM06I that, if x* is the result of a Count-Sketch with hash table size 0{k/€^), then 
outputting the top k elements of x* gives a (1 + e)-approximate £2/^2 recovery scheme. Here we show 
that a seemingly minor change — selecting 2k elements rather than k elements — turns this into a (1 + e^)- 
approximate I2 / ^2 recovery scheme. 

Theorem 3.1. Let x be the top 2k estimates from a Count-Sketch structure with hash table size 0{k/e). 
Then with failure probability n~^^'^\ 



<(l + e) 



Therefore, there is al + e-approximate ^2/^2 recovery scheme with 0{^k\ogn) rows. 

Proof. Let the hash table size be 0{ck/e) for constant c, and let x* be the vector of estimates for each 
coordinate. Define 5 to be the indices of the largest 2k values in x*, and E - 
By ([3]), the standard analysis of Count-Sketch: 



qfc] 



X 



- ck 



so 



Xc 



E' 



< ||(x* -x)5||2 + ||a:[„]\5||2 - 

< IS"! ||X* - X||^ + ||x[fc]\5||2 



\^S\[k]\ 



rs\[k]\ 



(4) 
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Let a = maxjg[^]\^5 Xj and b = mu\^g\^^j^j Xi, and let d 
element of value a to choose one of value b, so 



\ S\. The algorithm passes over an 



a < 6 + 2||x* 



Then 



"[k]\s\ 



\xs\[k]\\2 <da^ - ik + d)b^ 

<d{b + ^^I^Ef -{k + d)b^ 

<-kh^ + AJ^dbE + ^dE^ 
V ck ck 



and combining this with (01) gives 



or 



which proves the theorem for c > 3/2. 



<-k{b- 2^-^dEf + ^dE\k - d) 

^ U{k-d)e ^2 < 
ck^ c 



1 2 i:n2 ^ 3f 7712 



\x*s-x\\2-E' < —E' 



\x*s-x\\,<{l + ^jE 



□ 



3.2 Non-sparse 



log^ 1/e 



k log n) measure- 



Theorem 3.2. There exists a (1 + e)-approximate recovery scheme with 0{ ^ 
ments and failure probability e~^(^/^) + n~^'^^\ 

Set / = y/e, so our goal is to get (1 + /^)-approximate d-i/li recovery with 0( ^°^ ^ k log n) measure- 
ments. 

For intuition, consider 1 -sparse recovery of the following vector x: let c G [0, 2] and set xi = 1//^ and 
X2, . . . , Xi^iip+c G {±1}. Then we have 



and by ([111, a Count-Sketch with 0(l//)-sized hash tables returns x* with 



\x — x\ 



< 



[1//] 



1// 



c/2 



/ 



l+c/2 



11] 



The reconstruction algorithm therefore cannot reliably find any of the Xi for i > 1, and its error on xi is at 
least /i+'^Z^ x-^ . Hence the algorithm will not do better than a /^+^/^-approximation. 

However, consider what happens if we subsample an fraction of the vector. The result probably has 
about 1// non-zero values, so a 0(l//)-width Count-Sketch can reconstruct it exactly. Putting this in our 



output improves the overall ii error by about 1// = 



11] 



Since c < 2, this more than cancels the 



yl+c/2 



error the initial Count-Sketch makes on xi, giving an approximation factor better than 1. 
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This tells us that subsampling can help. We don't need to subsample at a scale below k/ f (where 
we can reconstruct well already) or above k/ (where the I2 bound is small enough already), but in the 
intermediate range we need to subsample. Our algorithm subsamples at all log 1//^ rates in between these 
two endpoints, and combines the heavy hitters from each. 

First we analyze how subsampled Count-Sketch works. 

Lemma 3.3. Suppose we subsample with probability p and then apply Count-Sketch with B(logn) rows 



and Q{q)-sized hash tables. Let y be the subsample of x. Then with failure probability e ^'^> + n ^ > we 
recover a y* with 



\y* - ylloo < Vp/q 



'[g/p] 



Proof. Recall the following form of the Chemoff bound: if Xi, . . . , Xm are independent with < Xj < M, 
and ^>E[Y,Xi\, then 



n{/i/M) 



Let T be the set of coordinates in the sample. Then E[ T n 



Pr 



> 2q 



2pi 



3q/2, so 



Suppose this event does not happen, so 



'[q/p] 



< 2q. We also have 
9" 



> 

2 \l 2p 



2p 



Let = if i ^ T and Yi = xf if i G T. Then 



For i > |2 we have 



giving by Chemoff that 



Yi < 



X 3q_ 
2p 



[Ml 



2 ^ 2p 



< p 



'[q/p] 



But if this event does not happen, then 

2 



< 



i&TA> 



i> 



'[q/p] 



By (O, using 0(2g)-size hash tables gives a y* with 

II * II ^ 1 

\\y -y\\oo < 



[2g] 



< Vp/q 



'\iIp\ 



with failure probabihty n ^'^^^ , as desired. 



□ 
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Let r = 2 log 1//. Our algorithm is as follows: for j G {0, . . . ,r}, we find and estimate the 
largest elements not found in previous j in a subsampled Count-Sketch with probability p = and hash 
size q = ck/ f for some parameter c = 0(r^). We output x, the union of all these estimates. Our goal is to 
show 



\x — x\ 



<0{f 



For each level j, let Sj be the 2-'/^ A; largest coordinates in our estimate not found in 5i U • • • U Let 
S = USj. By Lemma [331 for each j we have (with failure probability er^^^l /) _|_ 



\{^-^)s,\^ < \Sj\ 



< 2-JV2 



2-3 f 



ck 




I2fc//] 



and so 



(X — X 



By standard arguments, the £oo bound for Sq gives 



\\x[k]\\-^ < \\xso\\i + k\\xso -2;solloo ^ yfWc 
Combining Equations ^ and Q gives 

= ||(x — x)5||j^ + — ^ 

= \\{x-x)s\\i + ||^[fc]|li - W^sWi 
= \\{S: - x)s\\i + {\\x^k]\\i 



I2fc//] 



(5) 



(6) 



\x — x\\ 



< 



+ 



(1 - 1/V2)VH V~c 
1 



fk \\x- 



[2fc//] 



El 



F5. 



'J 111 



I2fc//] 



E 



J 111 



(V) 



We would like to convert the first term to depend on the ii norm. For any u and s we have, by splitting 
into chunks of size s, that 



u- 



[2s] 



2 \ S 



Ut 



ls]n[2s] 



Along with the triangle inequality, this gives us that 



12^//] 



'[2k IP 



[2ifc//]n[2J+ifc//] 



'[k/f' 



/2 



C2JA://| 
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so 



■ X 1 - 



T T 

.7=1 i=i 



(8) 



Define a. 



^2Jfc//|- The first term grows as p so it is fine, but aj can grow as /2-'/^ > 
We need to sliow that they are canceled by the corresponding In particular, we will show that 



l>n{aj)-0{2-^l^f 



with high probability — at least wherever aj > ||a||^ /(2r). 



Let U G [r] be the set of j with aj > \\a\\^ / (2r), so that ||a(7|| > ||a||-^ /2. We have 



^[2^fc//]n[2i+ifc//] 



< 



[2k/p 



2 1 V- 2 

2 kf ^ ^ 

i=j 



For j G U, we have 



(9) 



< ||a||]^ < 2ra 



1=3 



so, along with (y^ + z^)^/^ < y + z, we turn Equation Q into 



12^- fc//] 



< 



< 



[2fc//3 



+ 



1 



i=j 



[k/P 



j2r 



When choosing Sj , let T € [n\ be the set of indices chosen in the sample. Applying Lemma 13.31 the 
estimate x* of xt has 



\x* -xtW^ < 



f 



< 



23 ck 

r/2 



X- 



[■^^k/f] 




2ic k 
2ic k 



X- 



[k/P 




'[k/P 



1 ^\l¥^T 

[2r 

^ + ]J—\x2Jk/f\ 



for j e U. 

Let Q = pk/f\ \ (5o U • • • U Sj^i). We have \Q\ > 2^-^k/f so E[\Q D T\] > k/2f and |Q n T] > 
k/Af with failure probabihty e^^^"/-^). Conditioned on |Q n r| > k/Af, since xt has at least \Qr]T\ > 
k/{Af) = 2^^'^k/A > 2^/'^k/A possible choices of value at least |x2jfc//|, xs^ must have at least A;2-5'/^/4 
elements at least |a;2jfc//| — \\x* — xt\\^- Therefore, for j G U, 



'[k/P 



k2^/'^ 



r2Jfc//| 
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and therefore 



1 



1 + ^ 



> 



> 



(1 

2r. 



lot/Ill 



'E^2^' 
i=i 



/2 



(10) 



Using (Ull and ([TOll we get 



X — X 



< 



1 V4\/c Vc 



1 . 1 /2r 1 

c 



'[k/P 



X- 



[k] 



for some c = O(r^). Hence we use a total of y/clog 



n 



log^ 1// 
/ 



klogn measurements for 1 + 



approximate li / li recovery. 

For each j G {0, . . . , r} we had failure probabiUty e"^^*^/ + n"^^^^ (from Lemma 1331 and jQ n Tj > 
k/2f). By the union bound, our overall failure probability is at most 



proving Theorem l3.2l 



4 Lower bounds for non-sparse output and p = 2 

In this case, the lower bound follows fairly straightforwardly from the Shannon-Hartley information capacity 
of a Gaussian channel. 

We will set up a communication game. Let F <Z {S C [n] \ \S\ = /c} be a family of /c-sparse supports 
such that: 

• \SAS'\ > fcfor 5 / 5' G -F, 

• Pr5'gjr[z ^ S] = k/n for all i G [n], and 

• log = ^{k\og{n/k)). 

This is possible; for example, a random Unear code on [n/k]^ with relative distance 1/2 has these proper- 
ties nGurion I^ 

Let X = {x G {0, ±1}" I supp(x) G F}. Let w ~ A^(0, a^In) be i.i.d. normal with variance ak/n in 
each coordinate. Consider the following process: 

''This assumes n/fc is a prime power larger than 2. If n/k is not prime, we can choose n' £ [n/2, n] to be a prime multiple of 
k, and restrict to the first n coordinates. This works unless n/k < 3, in which case a bound of e(min(n, ifclog(n/fc))) = e(fc) 
is trivial. 
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Procedure First, Alice chooses S E T uniformly at random, then x & X uniformly at random subject to 
supp(x) = S, then w ~ A^(0, a^In)- She sets y = A{x + w) and sends y to Bob. Bob performs sparse 
recovery on y to recover x' « x, rounds to X by 5; = argmin^gj^^ ||x — x'\\2, and sets S' = supp(a;). This 
gives a Markov chain S^x^y^x'^S'. 

If sparse recovery works for any x + w with probability 1 — (5 as a distribution over A, then there is some 
specific A and random seed such that sparse recovery works with probability 1 — 5 over a; + w; let us choose 
this A and the random seed, so that Alice and Bob run deterministic algorithms on their inputs. 

Lemma 4.1. I{S- S') = 0(m log(l + ^)). 

Proof. Let the columns of be v^, . . . , f We may assume that the are orthonormal, because this can 
be accomplished via a unitary transformation on Ax. Then we have that yi = {v"^ ,x + w) = (f x) + w\, 

II '112 

where w'^ ~ -^^(0, ak \\v^\\2 /") = -^(0, ak/n) and 

^,[{v\x)^]=^s[Y.{v\)^] = - 

Hence yi = Zi + w'^ is a Gaussian chaimel with power constraint 'E[zf] < ^ ||^'*||2 and noise variance 
^[(■"^D^] — ll^ilr Hence by the Shannon-Hartley theorem this channel has information capacity 

majcl{zi;yi) = C < ^log(l + -). 

By the data processing inequahty for Markov chains and the chain rule for entropy, this means 

I{S; S') < I{z; y) = H{y) - H{y \ z) = H{y) -H{y-z\z) 
= H{y) - ^H{w[ I z,w^,...,w'i_^) 
= H{y) - J2 H{w[) < J2 HiVi) - H{w',) 

Tfl 1 

< log(i + _). (11) 
1 a 

□ 

We will show that successful recovery either recovers most of x, in which case I{S\ S') = n{k \og{n/k)), 
or recovers an e fraction of w. First we show that recovering w requires m = Cl{en). 

Lemma 4.2. Suppose w with Wi ~ A^(0, a^)for all iandn = log(l/(5)), and A & R"*x"/or 
m < Sen. Then any algorithm that finds w' from Aw must have \\w' — > {l — e)\\w\\2 with probability 
at least 1 — 0{6). 

Proof. Note that Aw merely gives the projection of w onto m dimensions, giving no information about the 
other n — m dimensions. Since w and the £2 norm are rotation invariant, we may assume WLOG that A 
gives the projection of w onto the first m dimensions, namely T = [m]. By the norm concentration of 
Gaussians, with probability 1 — (5 we have ||'u;||2 < (1 + e)na'^, and by Markov with probability 1 — (5 we 
have ||'i/^T|l2 < fna'^. 

For any fixed value d, since w is uniform Gaussian and w'^ is independent of Wjt, 

Fr[\\w' - w\\l <d]< Pr[\\{w' - w)j.\\l < d] < Pt[\\wt\\1 < ^• 
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Therefore 

Pr[||u;' - w\\l < (1 - 3e) ||u;||2] < Pr[||u;' - ?i;||2 < (1 - 2e)na^] 

<Fr[\\w^\\l < {l-2e)na^] 
< Fr[\\wY\\l < (1 - e)(?^ - m)a^] < 6 
as desired. Rescaling e gives the result. □ 

Lemma 4.3. Suppose n = n{l / e'^ + {k / e) log{k / e)) and m = 0{en). ThenI{S;S') = 9.{k\og{n/k)) for 
some a = Vi{l/e). 

Proof. Consider the x' recovered from A{x + w), and let T = S" U S' . Suppose that < logn) 

and ||tL'||2 / {oik) G [1 it e], as happens with probability at least (say) 3/4. Then we claim that if recovery is 
successful, one of the following must be true: 

-x||2 < 9e||w||2 (12) 
\\x^-w\\l<{l-2e)\\w\\l (13) 

To show this, suppose — x\\^ > 9e \\w\\2 > 9 ||tfT||2 (the last by |r| = 2k = 0{en/ log n)). Then 

\{x' - (x + w))t||2 > (ll^;' - x\\^ - ||tfT|l2)^ 

> {2\\x' - x\\^/3f > 4e||u;||2. 



Because recovery is successful, 

||x' - {x + w)\\l < (1 + e) ||u;|| 

Therefore 



\x'y - W7jT\f^ + \\x'j. - (X + w)t\\2 = \\x' - (X + W)\\'^ 

|x^ - Wjt\\1 + 4e ||u'||2 < (1 + e) \\w\\l 

\^T~M\l ~ W^tWI < (1 - 3e) \\w\\l < (1 -2e) \\w\\l 
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2 

< 

2 



as desired. Thus with 3/4 probability, at least one of (fT2l) and (fT3l) is true. 

Suppose Equation ([T3] ) holds with at least 1/4 probability. There must be some x and S such that the 
same equation holds with 1/4 probability. For this S, given x' we can find T and thus x^. Hence for 

a uniform Gaussian lu^r, given AiVj: we can compute A{x + wrp) and recover x^ with 

(1 — e) ll'iL'y llg- By Lemma 1421 this is impossible, since n — \T\ = il(^) and m = Q.{en) by assumption. 

Therefore Equation ([T2l) holds with at least 1/2 probability, namely \\x'j,-x\\l < 9€\\w\\l < 9e(l - 
e)ak < fc/2 for appropriate a. But if the nearest x G X to x is not equal to x, 

II / 1 1 2 II / ||2 , II / 1 1 2 ^ II / ||2 , /II II / II \2 

I /yt rin iTf _l ly ly^ ^> ly _l j ly ry ly \ 

II II 2 II II 2 1 1 II 2 II T^ll2 II 2 II II 2 

^ II ' l|2 I n r /n\2 ^ || / ||2 , || / ||2 n / ||2 

> IP5TII2 + [k - k/2) > Wx^W^ + ||X7^ - xjl^ = ||x - a;||2 > 
a contradiction. Hence S' = S. But Fano's inequality states H{S\S') < 1 + Pr[S' / 5] log \T\ and hence 

I{S;S') = H{S)-H{S\S') > -1 + ^log|^| = n{klog{n/k)) 
as desired. □ 
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Theorem 4.4. Any (1 + e) -approximate £2/^2 recovery scheme with e > y ^ " and failure probability 
6 < 1/2 requires m = Q{jklog{n/k)). 

Proof. Combine Lemmas 14.31 and |4~T] with a = 1/e to get m = ^{ ''\og("+e) ) ~ ^{jklog{n/k)), m = 
0(en), orn = 0{^k log{k/e)). For e as in the theorem statement, the first bound is controlhng. □ 

5 Bit complexity to measurement complexity 

The remaining lower bounds proceed by reductions from communication complexity. The following lemma 
(implicit in IIDIPWIOII ) shows that lower bounding the number of bits for approximate recovery is sufficient 
to lower bound the number of measurements. Let Bp{R) C denote the ip ball of radius R. 

Definition 5.1. Let X C M" be a distribution with Xi G {—n'^, ■ ■ ■ ,n'^} for all i G [n] and x G X. We 
define a 1 + e-approximate Ipjlp sparse recovery bit scheme on X with b bits, precision n^^, and failure 
probability 6 to be a deterministic pair of functions f : X ^ {0, 1}'' and g : {0, 1}'' — ?■ where f is linear 
so that f{a + b) can be computed from f{a) and f{b). We require that, for u G Bp{n~'^) uniformly and x 
drawn from X, g{f{x)) is a valid result ofl + e-approximate recovery on x -\- u with probability 1—5. 

Lemma 5.2. A lower bound of 0(6) bits for such a sparse recovery bit scheme with p < 2 implies a 
lower bound ofQ{b/ ((1 + c + d) log n)) bits for regular (1 + e) -approximate sparse recovery with failure 
probability 6 — 1/n. 

Proof. Suppose we have a standard (l+e)-approximate sparse recovery algorithm A with failure probability 

5 using m measurements Ax. We will use this to construct a (randomized) sparse recovery bit scheme using 
0(m(l + c + d) log n) bits and failure probability 6 -\- 1/n. Then by averaging some deterministic sparse 
recovery bit scheme performs better than average over the input distribution. 

We may assume that A G ]K»"X" has orthonormal rows (otherwise, if ^ = UHV^ is its singular value 
decomposition, T^^U'^A has this property and can be inverted before applying the algorithm). When applied 
to the distribution X -\- ufor u uniform over Bp{n~'^), we may assume that A and A are deterministic and 
fail with probability 6 over their input. 

Let A' be A rounded to t log n bits per entry for some parameter t. Let x be chosen from X. By 
Lemma 5.1 of BDIPWIOI . for any x we have A'x = A{x — s) for some s with ||s||^ < n^2~*^°s" \\x\\i, 
so ||s||p < n^'^~* ||x||p < n^-S+ii-* Let n G i?p uniformly at random. With probability at least 

1 — 1/n, u G -Bp ((1 — l/n^)n^-^+'^~*) because the balls are similar so the ratio of volumes is (1 — 1/n^)" > 
1 — 1/n. In this case n + s G Bp{n^-^~^'^~^); hence the random variable u and u + s overlap in at least a 
1 — 1/n fraction of their volumes, so x + s + n and x + ti have statistical distance at most 1/n. Therefore 
^(^(x + u)) = A{A'x + Au) with probabihty at least 1 - 1/n. 

Now, A'x uses only (t + (i+ 1) log n bits per entry, so we can set f{x) = A'x for b = m{t + d+l) log n. 
Then we set g{y) = A{y + Au) for uniformly random u G Bp{n^-^^'^^^). Setting t = 5.5 + d + c, this 
gives a sparse recovery bit scheme using 6 = m(6.5 + 2(i + c) log n. □ 

6 Non-sparse output Lower Bound for p = 1 

First, we show that recovering the locations of an e fraction of d ones in a vector of size n > d/e requires 
ri(ed) bits. Then, we show high bit complexity of a distributional product version of the Gap-^oo prob- 
lem. Finally, we create a distribution for which successful sparse recovery must solve one of the previous 
problems, giving a lower bound in bit complexity. Lemma [S!2l converts the bit complexity to measurement 
complexity. 
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6.1 ii Lower bound for recovering noise bits 

Definition 6.1. We say a set C C [q]'^ is a {d, q, e) code if any two distinct c,c' G C agree in at most ed 
positions. We say aset X C {0, 1}'^'' represents C if X is C concatenated with the trivial code [q] — )■ {0, l}*^ 
given by i ^ Cj. 

Claim 6.2. For e > 2/q, there exist {d, q, e) codes C of size by the Gilbert-Varshamov bound (details 

in t\DIPW10\l ). 

Lemma 6.3. Let X C {0, 1}'^'' represent a {d,q,e) code. Suppose y G W^'^ satisfies \\y — x\\^ < (1 — 
e) \\x\\^. Then we can recover x uniquely from y. 

Proof. We assume y-i G [0, 1] for all i; thresholding otherwise decreases ||y — We will show that there 
exists no other x' € X with — < (1 — e)||x||^; thus choosing the nearest element of X is a unique 
decoder. Suppose otherwise, and let S = supp(j;), T = supp(x'). Then 

(1 - e) \\x\\i > \\x - 

= Iklli - WvsWi + WvsWi 

hsWi > WvsWi + 

Since the same is true relative to x' and T, we have 

lly^lli + WvtWi > WvsWi + Ikrili + 2^'^ 
2 llysnTlli > 2 llysorlli + 2ed 
WysnrWi > ed 
|5nr| > ed 

This violates the distance of the code represented by X. □ 

Lemma 6.4. Let R = [s, cs] for some constant c and parameter s. Let X be a permutation independent 
distribution over {0, 1}" with \\x\\-^ £ R with probability p. If y satisfies \\x — y\\-^ < (1 — e) ||x||;^ with 
probability p' with p' — {1 — p) = f^(l), then I{x; y) = f](es log(n/s)). 

Proof. For each integer i £ R, let Xi C {0, 1}" represent an {i,n/i, e) code. Let pi = Ptx£x[\\x\\i = i]. 
Let Sn be the set of permutations of [n] . Then the distribution X' given by (a) choosing i £ R proportional 
to Pi, (b) choosing cr G 5„ uniformly, (c) choosing Xi G Xi uniformly, and (d) outputting x' = cj(xj) is 
equal to the distribution (x G X | ||x||^ G 

Now, because y > Pr[||x||]^ ^ R] + x' chosen from X' satisfies ||x' — y||^ < (1 — e) ||x'||^ with 
6 > p' — {I — p) probability. Therefore, with at least 6/2 probability, i and a are such that ||a"(xi) — < 
(1 — e) ||(T(xj)||^ with 6/2 probability over uniform Xi G Xi. But given y with \\y — a{xi)\\i small, we can 
compute y' = a^^{y) with — equally small. Then by Lemma 163] we can recover Xi from y with 
probability 6/2 over Xj G Xi. Thus for this i and a, I{x;y \ i,a) > Q,{log\Xi\) = Q,{6eslog{n/s)) by 
Fano's inequality. But then I{x;y) = Ei^fj[I{x;y \ i,cr)] = rj((5^es log(n/s)) = r2(es log(n/s)). □ 

6.2 Distributional Indexed Gap ioo 

Consider the following communication game, which we refer to as Gap^^, studied in IIBYJKS04II . The 
legal instances are pairs (x, y) of m-dimensional vectors, with Xj, yi G {0, 1, 2, ... , B} for all i such that 

• NO instance: for all i, yi — Xi £ {0, 1}, or 
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• YES instance: there is a unique i for which y-i — Xi = B, and for all j ^ i, yi — Xi £ {0, 1}. 

The distributional communication complexity D^^^sif) of a function / is the minimum over all deterministic 
protocols computing / with error probability at most 6, where the probability is over inputs drawn from a. 

Consider the distribution a which chooses a random i G [m]. Then for each j 7^ i, it chooses a random 
d £ {0,... , B} and {xi,yi) is uniform in {{d,d),{d,d + 1)}. For coordinate i, {xi,yi) is uniform in 
{(0, 0), (0, B)}. Using similar arguments to those in IIBYJKS04i . Jayram |Jay02| showed 1)^,5 (Gap£^) = 



Q.{m/B'^) (this is reference [70] on p. 182 of IIBY02II ) for 5 less than a small constant. 

We define the one-way distributional communication complexity D]^^^"'^{f) of a function / to be the 
smallest distributional complexity of a protocol for / in which only a single message is sent from Alice to 
Bob. 

Definition 6.5 (Indexed \'nAf^ Problem). There are r pairs of inputs {x^,y^), {x^,y'^), . . . , {x^,y^') such 
that every pair {x^,y^) is a legal instance of the Gap£^ problem. Alice is given x^, . . . ,x^. Bob is given an 
index I G [r] and y^, . . . ,y^. The goal is to decide whether {x^ , y^) is a NO or a YES instance ofGapi^. 

Let r] be the distribution a'^'xUr, where Ur is the uniform distribution on [r]. We bound /^^^^"'"^(Ind^oo)''-^ 
as follows. For a function /, let denote the problem of computing r instances of /. For a distribution 
( on instances of /, let D^^Y^^'* {F) denote the minimum communication cost of a deterministic protocol 
computing a function / with error probability at most 5 in each of the r copies of /, where the inputs come 
from . 

Theorem 6.6. (special case of Corollary 2.5 of HBRlI]! } Assume D^^sif) larger than a large enough 
constant. Then D^~^^^'* (f^) = ^}{rDa-,5{f)). 

Theorem 6.7. For 6 less than a sufficiently small constant, ^"'^{Indfc^) = Q.{5'^rm/ (B^ log r)). 

Proof. Consider a deterministic 1-way protocol 11 for Ind^oo with error probability 5 on inputs drawn from 
rj. Then for at least r/2 values i G [r], Pr[n(x^, . . . , x^,y^, . . . , y*", /) = Gapi^{x^ ,y^) \ I = i] > 1 — 26. 
Fix a set = {ii, . . . , ir/2} of indices with this property. We build a deterministic 1-way protocol H' for 
/^/^ with input distribution a^^"^ and error probability at most 66 in each of the r/2 copies of /. 

For each ^ G [r]\S', independently choose {x^,y^) ~ a. For each j G [r/2], let Z| be the probability that 
n(x-^, . . . ,x^,y^, . . . ,y^,I) = Gap£^(3;*J , y*J ) given / = ij and the choice of {x^,y^) for all ^ G [r] \ S. 

If we repeat this experiment independently s = 0(6''^ logr) times, obtaining independent Z|, . . . , Z| 
and let Zj = Zj, then Pr[Zj > s — s • 3(5] > 1 — ^. So there exists a set of s = 0{6~^ log r) repetitions 
for which for each j £ [r /2], Zj > s — s ■ 36. We hardwire these into 11' to make the protocol deterministic. 

Given inputs . . . , X''/^), (yi, . . . , y/2)) a^/^ to W, Alice and Bob run s executions of U, 

each with x'^ = and = for all j G [r/2], filling in the remaining values using the hardwired 
inputs. Bob runs the algorithm specified by 11 for each ij G S and each execution. His output for (X^ , ) 
is the majority of the outputs of the s executions with index ij. 

Fix an index ij. Let W be the number of repetitions for which Gap£^{X^ ,Y^) does not equal the output 
of n on input ij, for a random {X^,Y^) ~ a. Then, E[W] < 36. By a Markov bound, Pr[iy > s/2] < 66, 
and so the coordinate is correct with probability at least 1 — 66. 

The communication of 11' is a factor s = @{6^'^ log r) more than that of 11. The theorem now follows 
by Theorem[621 using that D„^i2s{Gap£^) = n{m/B^). □ 

6.3 Lower bound for sparse recovery 

Fix the parameters B = 0(1/6^/^), r = k,m = Xje'l'^ , and n = k/e^. Given an instance (x^, y^), . . . , (x^, y 
of Ind^oo , we define the input signal z to a sparse recovery problem. We allocate a set of m disjoint 
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coordinates in a universe of size n for each pair (x*, y*), and on these coordinates place the vector — x*. 
The locations are important for arguing the sparse recovery algorithm cannot learn much information about 
the noise, and will be placed uniformly at random. 

Let p denote the induced distribution on z. Fix a (1 + e) -approximate A;-sparse recovery bit scheme Alg 
that takes b bits as input and succeeds with probability at least 1 — 5/2 over z ~ p for some small constant 
5. Let S be the set of top k coordinates in z. Alg has the guarantee that if it succeeds for z ^ p, then there 
exists a small u with HuH^ < so that v = Alg{z) satisfies 

\\v- z-u\\-^ < {l + e)\\{z + u)in]\s\\i 

\\v-z\\, < (l + e)||z[„]\s||^ + (2 + e)/n2 

< (l + 2e)||2[„]\s||i 

and thus 

ll(^^ - + \\{v- z\n]\s\\^ < (1 + 2e)||z[„]\s||i. (14) 

Lemma 6.8. For B = B(l/e^/^) sufficiently large, suppose that Pr2^p[||(i; — z)s\\i < lOe • > 
1 — 5. Then Alg requires b = Q.{k/{e^/'^ log k)). 

Proof. We show how to use Alg to solve instances of Ind^oo with probability at least 1 — C for some 
small C, where the probability is over input instances to Ind^oi, distributed according to -q, inducing the 
distribution p. The lower bound will follow by Theorem 16.71 Since Alg is a deterministic sparse recovery 
bit scheme, it receives a sketch f{z) of the input signal z and runs an arbitrary recovery algorithm g on f{z) 
to determine its output v = Alg{z). 

Given x^, . . . , x^', for each i = 1, 2, . . . , r, Alice places — x* on the appropriate coordinates in the 
block 5* used in defining z, obtaining a vector zahce, and transmits f{zAUce) to Bob. Bob uses his inputs 
y^, . . . , y*" to place on the appropriate coordinate in S*. He thus creates a vector ZBoh for which ZAUce + 
ZBob = z. Given f{zAiice), Bob computes f{z) from f{zAUce) and f(zBob), then v = Alg{z). We assume 
all coordinates of v are rounded to the real interval [0, B], as this can only decrease the error. 

We say that 5* is bad if either 

• there is no coordinate j in S"* for which \vj\ > -f yet (x*, y*) is a YES instance of Gapfc^, or 

• there is a coordinate j in for which \vj\ > ^ yet either (x*, y*) is a NO instance of Gap£^^ or j is 
not the unique j* for which y*. — Xj, = B 

The £i-error incurred by a bad block is at least B/2 — 1. Hence, if there are t bad blocks, the total error is at 
least t{B/2 — 1), which must be smaller than lOe • ||2:[n]\S'l|i with probability 1 — 6. Suppose this happens. 

We bound t. All coordinates in z^^-^^g have value in the set {0,1}. Hence, ||2:[,i]\5||i < rm. So 
t < 20erm/{B — 2). For -B > 6, t < 30erm/B. Plugging in r, m and B, t < Ck, where C > is a 
constant that can be made arbitrarily small by increasing B = 0(l/e^/^). 

If a block 5* is not bad, then it can be used to solve Gap^Ji^ on (x*, y*) with probability 1. Bob declares 
that (x*, y*) is a YES instance if and only if there is a coordinate j in 5* for which \vj\ > B/2. 

T B 

Since Bob's index / is uniform on the m coordinates in Ind^oo , with probability at least 1—C the players 
solve \ 'nAfoo given that the li error is small. Therefore they solve I nd^^^ with probability 1 — 6 — C overall. 
By Theorem 16.71 for C and 5 sufficiently small Alg requires Vlirnr / {B'^lo^r)) = r2(fc/(e^/^ log /c)) bits. 

□ 

Lemma 6.9. Suppose Y'Vzr^p[\\{v — -z)[n]\5l|i] < (1 — 8e) • ||-Z[n]\5||i] > <^/2. Then Alg requires b = 
^{^k\og{l/e)). 
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Proof. The distribution p consists of B{mr, 1/2) ones placed uniformly throughout the n coordinates, 
where B{mr, 1/2) denotes the binomial distribution with mr events of 1/2 probability each. Therefore 
with probabihty at least 1 — 5/4, the number of ones lies in [Smr/8, (1 — 6/8)mr]. Thus by Lemma l64l 
I{v]z) > Q{emrlog{n/{mr))). Since the mutual information only passes through a 6-bit string, b = 
0(emr log(n/(mr))) as well. □ 

Theorem 6.10. Any (1 + e) -approximate l\/l\ recovery scheme with sufficiently small constant failure 
probability 6 must make Q{-^k/log^{k/e)) measurements. 

Proof. We will lower bound any ii/£i sparse recovery bit scheme Alg. If Alg succeeds, then in order 
to satisfy inequality (fT4l) . we must either have \\{v — z)s\\i < lOe • ||2:[„]\S'l|i or we must have \\{v — 
-2^)[n]\slli < (1 — 8e) • [[^[nj^^lli. Since Alg succeeds with probability at least 1 — (5, it must either satisfy 
the hypothesis of Lemma [Q] or the hypothesis of Lemma \63\ But by these two lemmas, it follows that 
b = Q{-^k/ logk). Therefore by Lemma \5?2\ any (1 + e)-approximate £i/£i sparse recovery algorithm 

requires i}{^k/ log^ (k/e)) measurements. □ 



7 Lower bounds for /c-sparse output 

Theorem 7.1. Any 1 + e-approximate t\/l\ recovery scheme with k-sparse output and failure probability 5 
requires m = ^{\{k log ^ + log ^)),for 32 < ^ < ne^ /k. 

Theorem 7.2. Any 1 + e-approximate £2 / £2 recovery scheme with k-sparse output and failure probability 5 
requires m = Q.{-^{k + log ^)), /or 32 <\< ne^ /k. 

These two theorems correspond to four statements: one for large k and one for small 5 for both £1 and 

£2. 

All the lower bounds proceed by reductions from communication complexity. The following lemma 
(implicit in BDIPWIOII ) shows that lower bounding the number of bits for approximate recovery is sufficient 
to lower bound the number of measurements. 

Lemma 7.3. Let p G {1, 2} and a = 0,{1) < 1. Suppose X C ffi" has \\x\\p < D and \\x\\^ < D' for all 
X £ X, and all coefficients of elements of X are expressible in 0(log n) bits. Further suppose that we have 
a recovery algorithm that, for any u with < aD and W^W^ < aD', recovers x G X from A(x + z/) 
with constant probability. Then A must have 0,(\og \X\) measurements. 

Proof. [Use lemma |5^ First, we may assume that A G ]^»"X" has orthonormal rows (otherwise, if A = xxx 
UTiV'^ is its singular value decomposition, T.^U'^A has this property and can be inverted before applying 
the algorithm). Let A' be A rounded to clogn bits per entry. By Lemma 5.1 of HDIPWIOI . for any v we 
have A'v = A{v — s) for some s with ||s||-|^ < 77,^2"^^°^"' ||f ||;^, so ||s||p < n^-^-c 

Suppose Alice has a bit string of length rlog|X| for r = 0(logn). By splitting into r blocks, this 
corresponds to xi, . . . , G X. Let /3 be a power of 2 between a/2 and a/4, and define 

r 

Zj=Y,P'xi. 

i=j 

Alice sends A'zi to Bob; this is 0{m log n) bits. Bob will solve the augmented indexing problem[citation?] — xxx 
given A' zi, arbitrary j G [r], and xi, . . . , xj-i, he must find Xj with constant probability. This requires A'zi 
to have Q,{r log \ X\) bits, giving the result. 
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Bob receives A'zi = A{zi + s) for < H^^i ||p < •n?'^~'^D. Bob then chooses u S 

B'p{n^-^~''D) uniformly at random. With probabiUty at least 1 - 1/n, u E Bp{{l - l/n^)n^-^~''D) by 
a volume argument. In this case u + s ^ Bp{n'^'^~^D); hence the random variables u and u + s overlap 
in at least a 1 — 1/n fraction of their volumes, so Zj + s + u and Zj + u have statistical distance at most 
1/n. The distribution of Zj + n is independent of A (unlike Zj + s) so running the recovery algorithm on 
A{zj + s + u) succeeds with constant probability as well. 

We also have \\zj\\^ < ^^~/^^^^ L> < 2(/3-' — Since r = O(logn) and /3 is a constant, there 

exists a c = 0(1) with 

\\zj + s + n||p < (2/3^' + n^-^-^ + n^-^"^ - 2/3^)D < p^'^aD 

for all j. 

Therefore, given xi, . . . , Bob can compute 

^(A'zi +Au-A'Y^ P'xi) = A{xj + + s + u)) = A{xj + y) 

i<j 

for some y with \\y\\p < aZ). Hence Bob can use the recovery algorithm to recover Xj with constant 
probability. Therefore Bob can solve augmented indexing, so the message A'zi must have r2(log n log \X\) 
bits, so m = r2(log |X|). □ 

We will now prove another lemma that is useful for all four theorem statements. 

Let X G {0, 1}" be /c-sparse with supp(x) C S for some known S. Let u G be a noise vector that 
roughly corresponds to having 0{k/eP) ones for p G {1,2}, all located outside of S. We consider under 
what circumstances we can use a (1 + e) -approximate £p/£p recovery scheme to recover supp(x) from 
A{x + v) with (say) 90% accuracy. 

Lemma|T4]shows that this is possible for p = 1 when l^j < 0{k/e) and for p = 2 when \ S\ < 2k. The 
algorithm in both instances is to choose a parameter fi and perform sparse recovery on A{x + v + z), where 
= /i for i G 5 and = otherwise. The support of the result will be very close to supp(j;). 

Lemma 7.4. Let S C [n] have \S\ < s, and suppose x G {0, 1}" satisfies supp(x) C S and = k. 

Let p G {1,2}, and V G M" satisfy W^sWoo — ll^llp — ^> '^^^ ll^lloo — B) for some constants a < 1/4 
and D = 0(1). Suppose A G M™^" is part of a (1 + e) -approximate k-sparse £p/£p recovery scheme with 
failure probability 5. 

Then, given A{xs + v), Bob can with failure probability 5 recover xs that differs from xs in at most 
k /c locations, as long as either 

p = l,s = G(-),r = e(-) (15) 
ce ce 

or 

p = 2,s = 2k,r = e{^) (16) 

Proof. For some parameter fi > D, let = /x for z G S" and Zi = elsewhere. Consider y = xs + + z. 
Let U = supp(xs') have size k. Let V C [n] be the support of the result of running the recovery scheme on 
Ay = A{xs + + Az. Then we have that xs + z is fi+ 1 over U, fi over S\U, and zero elsewhere. Since 

\\u + v\\^ < p{\\u\\'p + ||f lip for any u and v, we have 

\\yu\\l<Pi\\i-s + z)rr\\l + MO 

< p{{s - k)fiP + r) 

< p{r + sijP). 
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Since Hz^sHj^ < a and < we have 

||yc/|loo > + 1 - a 

l|yc/lloo ^/^ + " 

We then get 

\\yv\\l = \\yu\\l + \\yu\v\\l-\\yv\u\\l 

>\\yu\\l + \v\u\{{^i + l-ar-{^^ + ar) 

= huWl + \V\U\il + {2p - 2)m)(1 - 2a) 
where the last step can be checked for p G {1, 2}. So 

However, F is the result of 1 + e-approximate recovery, so 

WwWp < \\y-y\\p < (i + e) ||?/c7llp 

||y^||^<(l + (2p-l)6)||2/^||^ 



forp G {1,2}. Hence 



for a < 1/4, this means 



(l + (2p-2)rt(l-2.) 



l^^^l^ 2e(2p-l)p(r + s/xP) 



1 + (2p - 2)// 

Plugging in the parameters p= l,s = r = ^,ii = D gives 

iv^\i,i<Mi±iZ!W^o(^). 

Plugging in the parameters p = 2,q = 2,r = A* = gives 

|F\c/|<lMM = l^. 

Hence, for d = 0(c), we get the parameters desired in the lemma statement, and 

iv\t/i4. 

Bob can recover V with probability 1 — 5. Therefore he can output x given by Xj = 1 if i € F and Xj = 
otherwise. This will differ from xs only within iy \ IJ \MJ \ V), which is at most kjc locations. □ 



18 



7.1 k>l 



Suppose p,s,3r satisfy Lemma |7!4l for some parameter c, and let q = s/k. The Gilbert- Varshamov bound 
implies that there exists a code V C [qY with log|y| = O(rlogg) and minimum Hamming distance 
r/4. Let X C {0,1}'^'' be in one-to-one correspondence with V: x £ X corresponds to v £ V when 
= 1 if and only if Va = b. 

Let X and v correspond. Let S C [r] with \ S\ = k,so S corresponds to a set T C [n] with |T| = kq = s. 
Consider arbitrary u that satisfies ||z^||p < a \\x\\p and < a for some small constant a < 1/4. We 

would like to apply Lemma 1731 so we just need to show we can recover x from A{x + u) with constant 
probability. Let u' = x^ + i^, so 

ll^^'llp < ■^(II^tIIp + ll^llp) ^ — + a^r) < 3r 
IIi/^tII <l + a 

II 1 lloo 

i^T < a 

II lloo — 

Therefore Lemma 174] implies that with probability 1 — J, if Bob is given A{xt + i^') = A{x + u) he can 
recover x that agrees with xt in all but fc/c locations. Hence in all but k/c of the i G 5, j^} = 

so he can identify Vi. Hence Bob can recover an estimate of vs that is accurate in (1 — l/c)A; 
characters with probability 1 — (5, so it agrees with vs in (1 — l/c)(l — 6)k characters in expectation. If we 
apply this in parallel to the sets Si = {k{i — 1) + 1, ... , ki} for i G [r/k], we recover (1 — l/c)(l — 6)r 
characters in expectation. Hence with probability at least 1/2, we recover more than (1 — 2(l/c + 6))r 
characters of v. If we set 6 and 1/c to less than 1/32, this gives that we recover all but r/8 characters of 
V. Since V has minimum distance r/4, this allows us to recover v (and hence x) exactly. By Lemma 1731 
this gives a lower bound of m = Q (log \ V\) = Q{r log q) . Hence m = Q(^k log ^ ) for £i /£i recovery and 
m = Q{^k) for £2/^2 recovery. 



7.2 k = l,6 = o{l) 



To achieve the other half of our lower bounds for sparse outputs, we restrict to the k = I case. A /c-sparse 
algorithm implies a 1-sparse algorithm by inserting k — 1 dummy coordinates of value 00, so this is valid. 

Let p, s, 51r satisfy Lemma l7!4] for some a and D to be determined, and let our recovery algorithm have 
failure probability 5. Let C = l/(2r5) and n = Cr. Let V = [{s - 1)CY and let X' G {0, be 
the corresponding binary vector. Let X = {0} x X' be defined by adding xq = Oto each vector. 

Now, consider arbitrary x E X and noise z/ G M^+^^^i)'^'' with \\i^\\p < a and Wi'W^ < a for 
some small constant a < 1/20. Let e°/5 be the vector that is 1/5 at and elsewhere. Consider the sets 
5. = {0, {s - l)(i - 1) + 1, {s - - 1) + 2, . . . , (s - We would like to apply Lemma|73]to recover 
{x + v + e^/5)si for each i. 

To see what it implies, there are two cases: lla^sS'jllj^ = 1 and \\xsi 111 = (since Si lies entirely in one 
character, \\xsi G {0, 1}). In the former case, we have u' = x-g- + v + e^/S with 



kii;<(2p-i)( 



+ ll^^llp + ||e°/5||0 < 3(r + a^r + 1/5^) < 4r 
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< 1 + a 

< 1/5 + a < 1/4 



Hence Lemma |7!4l will, with failure probability 5, recover £5^ that differs from xs^ in at most 1/c < 1 
positions, so 2:5. is correctly recovered. 
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Now, suppose \\xsi \\i = 0. Then we observe that Lemma 1741 would apply to recovery from 5A{x + u + 
with u' = 5x + bv and x' = e^, so 

lli^'llp < 5Pp(||x||P + ||i/||p < 5Pp(r + qPt) < 51r 



< 5 + 5q! 

) 

ly'q. II < 5a. 

no — 



Hence Lemma 174] would recover, with failure probability 6, an x^- with support equal to {0}. 

Now, we observe that the algorithm in Lemma|74]is robust to scaling the input A{x' + u') by 5; the only 
difference is that the effective /i changes by the same factor, which increases the number of errors A;/c by a 
factor of at most 5. Hence if c > 5, we can apply the algorithm once and have it work regardless of whether 
||a^5illi ^ ^' ll^sJIi — 1 "^he result has support supp(xi), and if ||2;5. ||^ = the result has support 
{0}. Thus we can recover xSi exactly with failure probability 6. 

If we try this to the Cr = 1/(2(5) sets Si, we recover all of x correctly with failure probability at most 
1/2. Hence Lemma|73]implies that m = i7(log \X\) = Q{r log ^). For £i/£i, this means m = log ^); 

for £2/^2, this means m = ^{-^ log ^). 

Acknowledgment: We thank T.S. Jayram for helpful discussions. 
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